Learn how to benchmark embedding models on your own data in this course for beginners.
In this course, you will learn:
- The limitations of extracting text from PDF files with Python libraries and to solve that with the help of VLMs (Vision Language Models).
- How to divide the extracted text into chunks that preserve context.
- Generation questions for each chunk using LLMs (Large Language Models).
- Use embedding models to create vector representations of the chunks and questions.
- Use both open source and proprietary embedding models.
- Use llama.cpp to run models in the GGUF format locally on your machine.
- Perform the benchmarking of different embedding models using various metrics and statistical tests with the help of ranx.
- Plot the vector representations to visualize if clusters are being formed.
- Understand how to interpret the p-value that a statistical test provides.
- And much more!
You can find the slides, notebook, and scripts in this GitHub repository:
The dataset is available here:
To connect with Imad Saddik, check out his social accounts:
LinkedIn:
YouTube:
Website:
⭐️ Course Contents ⭐️
(0:00:00) About the course
(0:06:05) Introduction
(0:17:58) Extracting text from PDF documents
(1:01:08) Divide text into coherent chunks
(1:23:10) Generate question-answer pairs from text chunks
(1:38:48) Embed text chunks and questions
(2:17:06) Statistical tests and metrics
(3:12:01) Expanding the dataset and adding more languages
(3:45:
|
✅ Subscribe to our Channel to learn more...
🔥Microsoft AI Engineer Program - 🔥Part...
🔥Generative AI, Machine Learning, And In...
🔥Applied Generative AI Specialization - ...
Are you ready to dive into the world of ...
When researching online programs, many p...
AWS and Cerebras announced a collaborati...
Discover how Audi AG worked with AWS to ...
Storyblok delivers modern digital experi...
Jetpack Compose Glimmer is here to help ...
In Episode 1 of this 4-part series, @ama...
🔥Integrated MS+PGP Program in Data Scien...
BMW Group's Design and Virtual Product E...
LLMs alone can't deliver relevant custom...
PyCon JP Associationが主催するYouTubeライブです。実験...